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Abstract 

We identify a sub-class of BQP that captures certain structural commonalities among many 
quantum algorithms including Shor's algorithms. This class does not contain all of BQP (e.g. 
Grover's algorithm does not fall into this class). Our main result is that any algorithm in this 
class that measures at most 0(log n) qubits can be simulated by classical randomized polynomial 
time algorithms. This does not dequantize Shor's algorithm (as the latter measures n qubits) 
but our work also highlights a new potentially hard function for cryptographic applications. 

Our main technical contribution is (to the best of our knowledge) a new exact characteriza- 
tion of certain sums of Fourier-type coefficients (with exponentially many summands). 



* Supported by NSF CAREER grant CCF-0844796. 



1 Introduction 



One of the key problems in complexity theory is to determine the power of the complexity class BQP. 
Recall that this is the set of languages accepted by uniform polynomial size quantum circuits with 
bounded two-sided error. It is essentially the quantum version of the complexity class BPP. Just 
as BPP corresponds to what is feasible on a classical computer with randomness, BQP corresponds 
to what is feasible on a quantum computer. Whether BQP = BPP is a simple and focused way 
of asking, are quantum computers more powerful than classical ones? Whether actual quantum 
computers are constructed soon or in the distant future, this is a fundamental question. 

Excitement has come from the discovery quantum algorithms that can out-perform classical algo- 
rithms by an exponential or at least quadratic amount. The most famous are Shor's factoring and 
discrete logarithm quantum algorithms [Sho97] and Grover's search algorithm |Gro97| . plus real 
time simulation of many-body problems and other natural (quantum) processes as motivated by 
Feynman [Fe y82| . Since these seminal results there have been many many new algorithms for prob- 
lems in number theory, combinatorics, communication, and information theory that have applied 
quantum methods, as well as lower bounds for other problems. 

The statement that quantum algorithms can out-perform classical algorithms needs to be examined 
carefully. In some cases this can be proved unconditionally, provided the algorithms are restricted. 
For example, it can be shown that provided both kinds of algorithms are restricted to oracle 
access to a boolean function, Grover's algorithm indeed is faster than any classical algorithm (see 
[DH09]). These are a pretty and important class of lower bounds, but without such restrictions 
we are still in the dark — we cannot unconditionally yet prove that quantum polynomial time is 
not just polynomial time, with-or-without bounded-error randomness. There are other approaches 
besides attempting direct lower bounds on BQP: 



1. What are upper bounds for classical simulation of BQP? 

2. What restricted cases allow feasible classical simulation? 



Neither idea is new. Of course the former necessarily involves upper bounds for factoring. The latter 
has a long pedigree as described below, but first we single out two papers that are closest to ours 
in technical domain. DiVincenzo and Terhal |DT04] and Fenner et al. [FGHZ051 |FFG + Q6 studied 
constant-depth quantum circuits. These give polynomial-time classical simulations when either 
quantum fan-out is restricted ( |DT04j ) or success is highly amplified ([FGHZ05]), also involving 
restrictions on availability or use of ancilla qubits. Our paper studies special kinds of constant- 
depth circuits (some technically log-depth in their modeling) that have structures used by the 
above algorithms, but liberalizes the success probability, and focuses instead on the number of 
qubits measured to achieve it. 

For upper bounds, Valiant appears to be the first to note that this class is easily contained in 
PSPACE. A much deeper result is that of Aldeman and DeMarrais and Huang |ADH97j, who 
showed that BQP C PP. Since then there have been intense attempts to understand where quantum 
polynomial time resides. There are plausible conjectures that it is not contained in the polynomial 
time hierarchy, PH, but this is unproved (see |AarlO| [FU101 IAA11] ). Since the classical analog, 
BPP, is contained in the hierarchy thanks to the beautiful results of |Lau83, Sip83|, this would give 
evidence that quantum is different. Of course it would not be an unconditional proof since it still 
is possible — even if "unlikely" — that polynomial time could equal PSPACE, which would collapse 
many of our interesting complexity classes. 
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Serious work on simulating quantum computations by classical polynomial-time algorithms is often 
regarded as beginning with the Gottesman-Knill theorem |Got98j (see also [AG041 IAB06| IJoz08| 
lvdN091 IvdNlOj ). which shows that uniform BQP circuits restricted to a certain set of gates accept 
only languages in BPP. 

Valiant, later, attempted to understand the power of quantum algorithms in a famous paper that 
introduced the concept of matchgates [Val06]. Roughly he viewed quantum algorithms as quantum 
circuits, and allowed only special two-bit gates that were characterized by 4 x 4 matrices in which 
the sixteen entries obeyed a set of algebraic relationships. For these circuits he was able to show 
that they did have classic polynomial time simulations. This was done by a brilliant insight that 
showed that these circuits were related to certain special types of determinants. 

Subsequently, polynomial-time simulations have revolved around Valiant's theory of matchgates 
|Val06l ICT071 IVal08l IValTOl IGLVllj (cf. [UL08l[CT09llCLX10j ). They prove that any quantum cir- 
cuit based on matchgates uses at most logarithmic qubits, and thus can be described by polynomial 
sized matrices. This explains in another way why they are efficiently simulated by classic methods: 
the unitary matrices are not too large to write down. Recall in a general quantum algorithm the 
matrices are usually exponential in size and no direct simulation is possible. Indeed even Valiant's 
initial observation that BQP is contained in PSPACE requires that the exponentially large matrices 
be handled in an "implicit manner." 

Quantum Background. We now briefly summarize the concepts from quantum computing that 
are required to understand this paper. We talk about everything in terms of matrix-vector opera- 
tions as in [For03j. but allow more-general measurements in the standard basis. 

The state of a quantum algorithm on n qubits will be represented by a vector x G M N , where N = 
2 n . A quantum algorithm can be described as x multiplied by a sequence of N x N unitary matrices 
U\ , . . . , U m followed by a measurement (which can be followed by some classical computation) . A 
unitary matrix preserves the £2 norm of a vector; i.e., for any y G ~fil N and an JV x JV unitary 
matrix U, we have ||C^y||2 = ||y||2- If the unitary matrix is a permutation then it corresponds to a 
deterministic (classical) computation. Finally, given x = U m • U m -\ ■ ■ ■ U\ ■ x, a quantum algorithm 
does some measurements. In particular, given any subset R C [n] and an assignment b G {0, 1}'^', 
a measurement of the qubits R to check whether the bits are equal to the assignment b implies the 
computation of the quantity J2ies l^«l 2 ' where S C [N] is defined as follows. For every i G [N] 
assign to it a unique vector v(i) G {0, l} n . Then S is the set of all i such that v(i) projected down 
to the indices in R is exactly the vector b, (where b is b with each bit flipped). 

1.1 Our Approach 

Our approach is based on an observation about many of the known quantum algorithms. These 
include the Deutsch-Jozsa algorithm |Deu85l IDJ921 ICKMM98) . Simon's algorithm [Sim97j, and 
Shor's algorithm [Sho97], all share a structural property. We will define the property in a moment, 
but for now let's call them Z) 3 -algorithms, D for Deutsch and Z) 3 for describable o deterministic o 
decomposable. 

The property of being a D 3 -algorithm is based solely on the structure of the quantum circuit. It 
is used by many other algorithms, even beyond the ones cited above. However, not all quantum 
algorithms have this structure: the most important one that does not is the quantum search 
algorithm of Grover. We believe this is an important point: if the class of algorithms we plan to 



2 



study included all quantum algorithms we would most likely not make progress. But by limiting 
our research to a subclass we believe that progress is possible. Further, the fact that so many 
important algorithms have this structural property also means that the class is important. 

Our main result is the following: 

Theorem 1.1. Any polynomial time D 3 -algorithm that measures at most O(logn) bits can be 
simulated by a randomized polynomial time algorithm. 

Since this family of simulated algorithms includes Shor's factoring algorithm, does this mean that 
we can factor? No. The issue is the interplay between measurement and success probability. 
We note already forms of this issue that limits the polynomial-time simulation of Fenner et al. 
[FGHZ05]. This paper simulates in classical deterministic polynomial time any depth-d quantum 
circuit C with the property that for some given (but arbitrary) set R of output qubits, either 

• R is measured to be in the all-zero state with probability ^ 1 — 5, or 

• R is measured to be in that state with probability ^ e, 

provided 6 ^ (1 - e)/2 2d . This is a wide separation — in particular, it requires the probability in 
the "accept" case to be close to 1. There has apparently not been much progress in closing it 
[FG11]. For contrast, our result holds under the standard (1/3,2/3) separation, and versions allow 
for moderate changes either to the separation or the number of qubits by which it is achieved. Note 
that R corresponds to a much larger set S of co-ordinates in the underlying iV-dimensional Hilbert 
space, where N = 2 m in the case of m = qubits. The number m — n of ancilla qubits allowed 
to the circuits on inputs of size n is also an issue, as shown in the followup paper [FFO+06j and 
related to the use of fan-out gates in [D"T04j . 



Technical Contributions. We make, we believe, two main technical contributions. The first 
is the exact characterization of certain sums. In particular, given a unary matrix M with certain 
properties and any vector x, we exactly characterize the £2 norm of certain sub- vector of Mx (which 
correspond to certain quantum measurements) in terms of components of x. To the best of our 
knowledge, this result is new. 

For some instances of D 3 algorithms, the matrix M turns out to be a Fourier transform (over { — 1, 1} 
or the reals). It is natural to wonder if existing sub-linear time algorithms ( [G GI + 02"1 I AGS03} lAkalOl 



IIwe08] ) to compute Fourier coefficients could be useful in our context. In Section 5.1, we argue 
that these algorithms (at least as a black box) do not lead to an efficient simulations of the D 3 
algorithms, while our exact characterization works. 

We believe that this exact characterization is surprising, and potentially of use elsewhere in quantum 
theory as well as elsewhere in theory in general. In hindsight perhaps the exact characterization 
is simple, but initially we had an approximate lemma. Only later did we realize that the value we 
needed could be written down in closed form. 

The other technical contribution is the notion of succinctness that we introduce. This notion 
based on the census idea seems new. While it is created to work perfectly with permutation 
matrices, it will likely be useful elsewhere in complexity theory. There are many studies of notions 
of succinctness and a new one could certainly lead to new results. 
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A Cryptographic Implication. The quantum part of Shor's algorithm actually does not com- 
pute the factorization. It computes an intermediate function (by measuring n qubits) and then 
computes the factors from that output using ideas from continued fractions. Our algorithm only 
measures O(logn) bits. If we cannot push our positive results to measure n qubits, then this points 
to potentially new hard problem that could be useful for cryptographic applications. In particular, 
the (potentially hard) problem would be: given say the measurement of r(n) qubits from Shor's 
algorithm, compute the factorization. In fact by our result, when r(n) = O(logn), this problem is 
at least as hard as factoring (which of course is a widely used one-way function in practice). We 
plan to expand on this question in the final version of the paper. 



Organization. We start with some preliminaries in Section [2] We then formally define our 
subclass of BQP in Section [3j We prove our main technical result in Section [4] and then use it 
to complete our proof of Theorem 1.1 in Section [5] We conclude with some open questions in 
Section [6l 



2 Preliminaries 

All the definitions we use are standard. We usually use lower case letters for vectors and upper 
case for matrices. If x is a vector, then is the /c-coordinate of the vector; if A is a matrix, then 
Aij is the entry in position All our vectors and matrices are in Euclidean space for some 

t. The inner product of two vectors is (x,y). The 2-norm is ||x||2- If x is a vector then x T is its 
transpose. Also an elementary vector is just a vector with one 1 and all the rest of its coordinates 
0. If x and y are n-bit vectors, then x ■ y is their boolean inner product 

xiyi © • • • x n y n . 

The notions from quantum computing and complexity theory also are standard. As usual BQP is 
the complexity class of bounded error quantum polynomial time. The complexity class PH is the 
polynomial time hierarchy. 

Some special notation. We always use N to denote 2 n . Usually n is the number of qubits, which 
often involves padding the input size "no" with additional qubits, but when n is linear or even 
polynomial in no we can often ignore the distinction. For each quantum circuit Q there is a 
corresponding unitary matrix Uq that corresponds to the action of the circuit. 

All vectors are in M. N , although all results generalize to complex vectors. We use ||f||2 for the £2 
norm of a vector and ||f ||oo for the maximum norm. 

We consider vectors and matrices whose entries come from a fixed finite set F. For a vector v € F N 
and index i ^ N the census C v (i) is the set of pairs (a, \ {j ' ^ i : vj = a}\) giving the number of 
previous elements that equal a. If v is a binary string then we can set C v (i) = the number of l's 
in v(l . . . i). For a matrix A, the census C7a(*,j) similarly counts entries in row-major order. 

Definition 2.1. A vector or matrix of dimension N = 2 n is succinct if its census is computable 
by a circuit of size bounded by a fixed polynomial in n. A subset S of [N] is succinct if its indicator 
string is succinct. 

Certainly these definitions imply the ability to compute the entries v(i) or A(i,j) themselves. The 
reason for the stronger census definition is to allow binary search: 
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Lemma 2.2. (a) If D is a succinct N x N permutation matrix, then there is a fixed polynomial 
size circuit that given i computes d(i) where d : [N] — > [N] is the permutation computed by 
D. 

(b) If S and S' are succinct sets of the same cardinality, then there is a succinct permutation d 
that carries S onto S' . 

(c) The product of a succinct matrix and a succinct permutation matrix is succinct. 

Proof. For (a), do binary search on row i implicitly using the row-major string. For (b), given any 
i we can first test i £ S. If so, then use binary search to compute the census C\{i), namely the 
rank of i in S, and use binary search again to find the element of the same rank in S' . If not, 
apply the same reasoning to the complements of S and S'. Finally for (c), to compute entry (i,j) 
of A' = DA, it suffices to compute A'(i,j) = A(d(i),j), since the previous i — 1 rows contribute 
i — 1 to the row-major order census. □ 

Lemma 2.3. If D is the matrix of a quantum circuit of fixed polynomial size with only deterministic 
gates drawn from a finite set of bounded-arity gates, then D is succinct. 

Proof. Every elementary quantum gate defines a succinct permutation matrix, and succinctness 
of products follows from Lemma |2.2[ c). To be careful with concrete versus asymptotic notation, 
if we have a p(n)-fold product D\ ■ ■ ■ D p r n \ of permutation matrices each of whose succinctness is 
witnessed by circuits of size q(n), then the composition of the permutations is computed by circuits 
of size 0(p(n)q(n)), so the witness for its succinctness stays bounded by a fixed polynomial. □ 

We will say that we can compute a quantity q to within an additive error e if we can compute q' 
so that 

\q — q | ^ s. 

We will be interested in the £| norm of sub-vectors: 

Definition 2.4. Let x = (x x , . . .,x N ) G and let S C [N\. Then let 

Finally, we will need the definition of the count of a given pattern in "folded" vectors: 

Definition 2.5. Let r 1 be an integer that divides N and let v = (vi, ■ ■ ■ ,v r ) 6 C r . Then for 
any x = (xi, . . . , xn) G C^: 

# v (x) = \{ie [N/r]\vj = Xj. N/r+i for every j £ [r]}. 

3 The Structural Property 

In this section we will explain the property that defines the class of D 3 quantum algorithms. 
Consider a polynomial size quantum circuit that consists of three stages: the A, the T>, and the 
B — see Figure IT} This is a .D 3 -circuit provided: 
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Figure 1: Our quantum circuit model 



1. The circuit A defines a succinct unitary matrix, namely one with a special small circuit 
description. 

2. The circuit T> consists of only deterministic gates. 

3. The circuit B defines a decomposable type unitary matrix. 

Further we insist that the number of different values that appear in the matrix A are from a finite 
set of values. For a family of matrices with describing circuits of size g(n), we say the matrices are 
(/(n)-succinct. (If not mentioned otherwise, we will call a matrix succinct if g(n) = poly(n).) 

Many matrices are succinct. For example Hadamard matrices are succinct, and so are products of 
two Hadamard matrices, for any gin) above some linear growth. 

For now we subsume the distinction between input and ancilla qubits and suppose m = n, or at 
least m = Q(n). We will overload notation and think of the rows and columns being indexed by 

{0, l} n or [N] d = {0, . . . , N — 1}. Given a state vector z G resulting from the computation 
and a set S C [N] of coordinates corresponding to "accept" for a measurement in the standard 



basis, we wish to compute or approximate the quantity ns( z ) from Definition 2.4 which gives the 
quantum acceptance probability. 

Definition 3.1. Let S C [N] be succinct and have cardinality \S\ = N/r for some integer r ^ 1. 
Let v = (ui,..., v r ) £ C r be some fixed vector. An N x N matrix M is called (S, v) -decomposable 
if the following property is satisfied. Let the row-submatrix of M indexed by the rows in S be given 
by (Mo, . . . , M r _i), where each Mi is an N/r x N/r matrix. Then there exists a unitary N/r x N/r 
matrix U such that for each i £ [r], Mi = v\ • U . 

We observe that provided S is succinct, this definition allows us general range of S when building 
a .D 3 circuit. Namely, suppose the components A and T> are originally designed with regard to a 
different measurement set S' of the same cardinality. We can then surround them by permutation 



matrices carrying S onto S 1 , and the result is still a -D 3 circuit by Lemma 2.2 Thus it does not 
matter that our key definition is worded to depend on the set S, and there is freedom to choose an 
S. Note that the decomposability definition depends only on the corresponding matrix rows. We 
note that the sets S resulting from important examples are always succinct, and this is a reasonable 
stipulation for measuring a quantum circuit of size poly(n) in general. 

We show that the set of decomposable matrices is non-empty by showing that two well-known 
families of unitary matrices are decomposable. The first, H n , contributes depth only 1 in a quantum 
circuit, but the second is reckoned as log-depth, so our D 3 notion is incomparable with previous 
definitions of constant-depth quantum circuits. 



Hadamard Matrices. We begin with the definition of Hadamard matrices: 
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Definition 3.2. The Hadamard matrix H n of order N = 2 n is defined recursively as follows 

J_ / H n ~ l H n ~ l 

where H° = (1). 

It is well-known that Hadamard matrices are unitary. 

If one unravels the recursion above for i steps, then every consecutive "chunk" of N/2 1 rows of H n 
are of the form 

2^ 2 (oo • E n ~ i a 2 ■ H n ~ l . . . a 2 i_! • H 71 ^) , 
where each aj G {—1, 1}. This implies that 

Proposition 3.3. Let H n be theNxN Hadamard matrix and let S = {j-N/2\ (j + l)N/2 i -l} 
for some i ^ 1 and j G [2*]. Then H n is (S, (vo, . . . ,v 2 i-i)) -decomposable where for each j G [2*], 
r ; ( { 2 ' 2 .2 ; 2 }. 

Fourier Matrices. We begin with the definition of quantum Fourier matrices: 
Definition 3.4. The Fourier matrix F N of order N has as its (i,j)th entry -y= • oj 1 £, where un 



the Nth root of unity and i,j G [N]. 



is 



It is well-known that F N is a unitary matrix over C. 

Let k divide N. Then it is easy to see that co 1 ^ = UN/k is the N/kth root of unity. Further, note 
that for any i = k£ for some i G [N/ k] , we have that the ith row of F N is given by 

l/Viv ■ II, . . . ,u> N ,oj n ,...,u N ,---,u N ,...,co N I, 

which is the same as 

In other words the kith, row in i* 1 ^ is the ith row of F^/^ repeated k times (and multiplied by a 
factor of 1/Vk). In other words, the rows of F N that are divisible by k form the matrix 

#\ IS= ) 

This implies the following result: 

Proposition 3.5. Let F N be the Fourier matrix of order N. Let k be an integer dividing N and 
let S = {i G [N]\i mod (k) = 0}. Then F N is j S, j=( l,l ..,1 ) J -decomposable. 

\ k times / 
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3.1 Instances of D 3 quantum algorithms 



We will show soon why such matrices are interesting from the perspective of simulating a D 
polynomial time quantum algorithm. For now we note: 

1. In the Deutsch-Jozsa algorithm, A and B are H n+1 and H n (with one ancilla qubit), while 
T> is the deterministic function / plus y © f{x) on the ancilla. 

2. In Shor's algorithm, as presented in the first diagram on 3n qubits in |Bea03j . A is H 2n with 
constant- 1 on the last n qubits, B is the inverse Quantum Fourier Transform, and T> are 
applications of x \-> a 2J x mod M on the last n qubits, controlled from the first In. 

4 An Exact Characterization 

In this section, we show that if each entry of a vector x takes some fixed discrete values, then there 
is an exact characterization of /is(Mx T ), where M is (S, -)-decomposable. 

Lemma 4.1. Let S C C be a finite set. Let r 1 be an integer such that r divides N . Let 

5 C [N] with \S\ = N/r and let v G C be a fixed vector. Let M be an N x N matrix that is 
(S,v)-decomposable. Then for every x G Y< N , we have 



Proof. Let U be the N/r x N/r unitary matrix such that the row sub-matrix of M corresponding to 
S is given by (Mo, . . . , M r _i) such that for each i G [r], Mi = vi-U . Further, let x = (x°, . . . , x r_1 ) G 

(s^Ay. it is easy to check that 



where for a vector y G C , ys denotes the vector projected to indices in S. Consider the following 
sequence of relationships: 



w(Mx T )= ^(c,v) 2 -# c (x). 



ces r 



(Mx T ) 5 = M (x°) T + • • • + M r _ 1 (x'- 1 ) T 



(1) 



/i 5 (Mx T ) = ||(Mx T ) s 



(2) 



r-1 




(3) 



i=0 




(5) 



(6) 




(7) 



ces r 



S 



] n the above, ^ follow from definitions. ^ follows from ([I]). Q follows from the definition of Mi's. 



(5) follows from the fact that U is a unitary matrix. (6) follows from stating x 3 = (xq, 
finally, u7u follows by re-arranging the sum, the fact that x G T, N and by Definition 2.5 



, x 



N/r-l' 



□ 



5 Approximating ^(¥x r ) 



In this section, we show how to approximate /is(Mx T ) quickly using Lemma 4.1 when x takes 
values in a bounded set within the reals and is succinct. 



Given the exact characterization from Lemma 4.1, the algorithm to approximate ^(Mx ) for 



succinct x follows by the natural sampling algorithm. (The details are in Appendix |A}) 

Theorem 5.1. Let e > be a real. Let £ C M be a finite set and let A = max&gs |fe|. Let r ^ 1 
be an integer such that r divides N . Let S C [N] with \S\ = N/r and let v S T r be a fixed vector, 
where T C M with B = max ag r M- Let M be an N x N matrix that is (S,v)-decomposable. 

Then for every succinct x E S^, there exists an O • |£| 2r l°g|£| ■ (|j4||-B|) 4 ■ log ^ N^j -time 
randomized algorithm that outputs an estimate ft such that with probability at least 2/3, 

|/}-^s(Mx T )| < £• — . 

r 



Next, we state a corollary that will be useful. 

Corollary 5.2. Let e > be a real. Let E C M be a finite set and let A = maxf, e s Let r ^ 1 
6e an integer such that r divides N . Let S C [N] with \S\ = N/r and let v G r r be a fixed vector, 
where ret with B = max ag r Let M be an N x N matrix that is (S, ^-decomposable. 

Let |j4|, \B\ ^ 0(1) andr ^ 0(loglog N). Then for every succinctx. £ T, N , one can approximate 
fis(Mx T ) to within an additive sN/r factor in randomized poly(log iV) time. 



In particular, by setting r = O(loglogiV) = O(logn), Corollary 5.2 implies Theorem |1.1| 



5.1 Sub-linear Time Fourier Transforms 



We now compare our techniques with the existing ones on approximating Fourier coefficients. If M 
is the Hadamard matrix (and the Fourier matrix resp.), then given M • x T is the Fourier transform 
of x over {0, 1}^ (and C N resp.). Further, there exist sub-linear algorithms that can estimate any 
Fourier coefficient of x to within an additive 7 > error in time poly(log N, ±)F1 This was implicitly 
for done for the Hadamard matrix in the work of Goldreich and Levin |GL89j . The result for Fourier 
matrix was done by Gilbert et al. jGGI+02] . This was improved upon by Akavia et al. [AGS03] . 
(There are now deterministic algorithms known to estimate the Fourier coefficients |AkalO[[lwe08j .) 

To estimate fj,s(Mx T ) one could try and use the sub-linear algorithms above to estimate the Fourier 
coefficients indexed by S. There are two issues with this approach: (i) If one added the estimates 
of {(Mx T )j}j g 5, then to get an overall error of e one would have to set 7 = 0{e/\S\)- this would 
lead to a run time that is polyd/SI). (ii) Further, even if one had perfect estimates of the Fourier 

1 Actually, these algorithm can compute Fourier coefficients with mass r • ||Mx T ||2 to within additive error e > 
for any r, e > in time poly (log N, i, A). The claimed result follows by just setting r = e = 7. 
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coefficients in {(Mx T ),}j e s, even adding them up would take time. In our simulations, we 

have \S\ Q(N), which of course is too inefficient. Alternatively, one could use the fact that these 
estimation algorithms can estimate the Fourier coefficient with at least r fraction of the total mass. 
However note that we can have /js(Mx t ) be 0(1) while all the Fourier coefficients themselves have 
0(1/|5'|) of the mass. Since the sub- linear algorithms to estimate the top Fourier coefficient have 
polynomial dependence in 1/r, this method can be prohibitively expensive. 

It is perhaps instructive to see how our simulation avoids the pitfalls above. The most obvious 



reason is that Lemma 4.1 gives a direct characterization of the quantity fis{Mx T ) that we want 
to estimate. However, we do not compute //,s(Mx T ) exactly but estimate is by a simple sam- 
pling procedure. In particular, we show that we estimate each of the summand in the equality in 
Lemma 4.1 with accuracy 7 > and then take the union bound. However, unlike the case in the 
previous paragraph, we have to take union bound over exp(N/\S\) events. In our simulations, we 
use \S\ ^ fi(iV/ log log N) which implies that we need to set 7 = eexp(— Mr). Since our run time is 
poly(l/7), by our choice of |5| we obtain an efficient poly(logA r ) runtime overall. 



6 Open Problems 

Our work raises several interesting questions. Can Theorem |1.1| be improved to allow the measure- 
ment of more qubits? We can increase the number of qubits to O(logn) giving only a (1/3,2/3) 
separation and still have the simulation take polynomial randomized time. 

Problem 1. What is the exact classical cost of simulating any D 3 '-algorithm that measures r bits? 

Our current result shows that the cost is upper-bounded by 2°^ times the cost of running the 
circuit. We will investigate whether this bound can be improved. 

We note that our techniques and those in the sub-linear algorithms to estimate Fourier coefficients 
are complementary: our results work better when |5| is large where as other techniques work better 
when I SI is small. An intriguing possibility is: 

Problem 2. Can we design a simulation algorithm for D s quantum polynomial time algorithms 
that has the best of both the worlds? 

As a technical matter, we may also ask where and why the -D 3 definition may not allow standard 
transformations that establish robustness properties of quantum circuits, such as measuring just 
one qubit and/or deferring measurements. The strengthened succinctness notion is a non- uniform 
version of the idea of polynomial-time rankability [GS91 , IHR90| with different scaling, which may 
provide further scope for studying it. 

The final open question is the possibility of a new hard function for cryptographic applications 
coming out of this work. 

Problem 3. Can we utilize the following as a hard problem for cryptographic applications: Given 
r(n) (say 0(\ogn)) qubits from the quantum part of Shor's algorithm, compute the complete fac- 
torization. 
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A Proof of Theorem 5.1 



Proof. Let I ^ 1 be a parameter that will be fixed later to be £ = 0(l/e 2 -|E| 2r -r 5 -(|A|-|B|) 4 -log |E|). 
Consider the following simple sampling algorithm: 

1. Setp^O. 

2. Repeat I times: 

• Pick a random i G [N/r]. 

• Set p <- p + ( YJj=o v j • x ) 



3. Output 8 



v 



We will argue shortly that 5 approximates > * s ^ 1 * - to within an additive factor of e. First, we 

bound the run time of the algorithm. It is easy to see that step 2 dominates the run timej^] 
Each of the £ repetitions involves reading r x l - values (each of which takes poly(log N) time as x 
is succinct) and performing 0{r) additions and multiplications. Altogether, the algorithm takes 
0{£{r + r • log^ 1 ) N)) time, which with our choice of £ implies the claimed running time. 

We now argue that the probability that \6 — fis(M~x T )/(N/r)\ > e is at most 1/3. Assigning the 
estimate £i = 5 ■ N/r would then complete the proof. 



2 We'll make the simplifying assumption that all basic arithmetic operations are O(l) time. 
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For every c E £ r , let <5 C be the random variable denoting the fraction of the indices i S [N/r] chosen 
in Step 2 such that (xq, . . . , a£_i) = c. Note that 

5=^5 c (c,v) 2 . (8) 
We will show that for every c G S r with probability at most 3 ^ r , 

\s - MO, > f (9) 

1 c iV/r 1 • (rl^H^I) 2 " ^ ' 



Since (c, v) r|j4||S|, we have with probability at most irk 



r 3 



3P1 

M 2 4-^|>^. (io) 



Taking union bound over all the choice of c, (10), (|8j) and Lemma 4.1 implies that with probability 
at most 1/3, 

/xs(Mx^) 

as desired. 

To complete the proof, we argue ([9]). First, note that 

#c(xK 



iV/r 



Thus, by the (additive) Chernoff bound, the probability that \5 C £ — E[6c£]\ > at is upper bounded 
by 

i 

exp(-2a 2 f) ^ 



3- |£| r ' 

where the inequality follows if 

Noting that to prove (|9|), we need to pick a = iw^ojjgjp , we need 



|2r 



0^-(r|^||S|) 4 log(3|Sn, 

which is satisfied if we pick 

£ = O (l/e 2 • |£| 2r • r 5 • (|A| • \B\) 4 ■ log |E|) . 

□ 



14 



